- Home
- Search Results
- Page 1 of 1
Search for: All records
- 
                                    Total Resources4
- Resource Type
- 
                                    
                                    
                                    
                                    0004000000000000
- More
- Availability
- 
                                    
                                    40
- Author / Contributor
- Filter by Author / Creator
- 
                                    
                                        - 
                                                    
                                                        
                                                            
                                                            Dodge, Jesse (4)
- 
                                                    
                                                        
                                                            
                                                            Bhagia, Akshita (2)
- 
                                                    
                                                        
                                                            
                                                            Singh, Sameer (2)
- 
                                                    
                                                        
                                                            
                                                            Soldaini, Luca (2)
- 
                                                    
                                                        
                                                            
                                                            Adelani, David (1)
- 
                                                    
                                                        
                                                            
                                                            Anastasopoulos, Antonios (1)
- 
                                                    
                                                        
                                                            
                                                            Bamman, David (1)
- 
                                                    
                                                        
                                                            
                                                            Costa-jussà, Marta R. (1)
- 
                                                    
                                                        
                                                            
                                                            Elazar, Yanai (1)
- 
                                                    
                                                        
                                                            
                                                            Faisal, Fahim (1)
- 
                                                    
                                                        
                                                            
                                                            Federmann, Christian (1)
- 
                                                    
                                                        
                                                            
                                                            Fedorova, Natalia (1)
- 
                                                    
                                                        
                                                            
                                                            Gardner, Matt (1)
- 
                                                    
                                                        
                                                            
                                                            Groeneveld, Dirk (1)
- 
                                                    
                                                        
                                                            
                                                            Gururangan, Suchin (1)
- 
                                                    
                                                        
                                                            
                                                            Guzmán, Francisco (1)
- 
                                                    
                                                        
                                                            
                                                            Hajishirzi, Hannaneh (1)
- 
                                                    
                                                        
                                                            
                                                            Ibn Alam, Md Mahfuz (1)
- 
                                                    
                                                        
                                                            
                                                            Klein, Lauren (1)
- 
                                                    
                                                        
                                                            
                                                            Koshelev, Sergey (1)
 
- 
                                                    
                                                        
                                                            
                                                            
- Filter by Editor
- 
                                    
                                        - 
                                                    
                                                        
                                                            
                                                            & Spizer, S. M. (0)
- 
                                                    
                                                        
                                                            
                                                            & . Spizer, S. (0)
- 
                                                    
                                                        
                                                            
                                                            & Ahn, J. (0)
- 
                                                    
                                                        
                                                            
                                                            & Bateiha, S. (0)
- 
                                                    
                                                        
                                                            
                                                            & Bosch, N. (0)
- 
                                                    
                                                        
                                                            
                                                            & Brennan K. (0)
- 
                                                    
                                                        
                                                            
                                                            & Brennan, K. (0)
- 
                                                    
                                                        
                                                            
                                                            & Chen, B. (0)
- 
                                                    
                                                        
                                                            
                                                            & Chen, Bodong (0)
- 
                                                    
                                                        
                                                            
                                                            & Drown, S. (0)
- 
                                                    
                                                        
                                                            
                                                            & Ferretti, F. (0)
- 
                                                    
                                                        
                                                            
                                                            & Higgins, A. (0)
- 
                                                    
                                                        
                                                            
                                                            & J. Peters (0)
- 
                                                    
                                                        
                                                            
                                                            & Kali, Y. (0)
- 
                                                    
                                                        
                                                            
                                                            & Ruiz-Arias, P.M. (0)
- 
                                                    
                                                        
                                                            
                                                            & S. Spitzer (0)
- 
                                                    
                                                        
                                                            
                                                            & Sahin. I. (0)
- 
                                                    
                                                        
                                                            
                                                            & Spitzer, S. (0)
- 
                                                    
                                                        
                                                            
                                                            & Spitzer, S.M. (0)
- 
                                                    
                                                        
                                                            
                                                            (submitted - in Review for IEEE ICASSP-2024) (0)
 
- 
                                                    
                                                        
                                                            
                                                            
- 
                                    Have feedback or suggestions for a way to improve these results?
 !
                                    
                                        
                                            Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Large language models’ (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation. However, decisions around what data is retained or removed during this initial stage are underscrutinized. In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. We create a new dataset of 10.3 million self-descriptions of website creators, and extract information about who they are and where they are from: their topical interests, social roles, and geographic affiliations. Then, we conduct the first study investigating how ten “quality” and English language identification (langID) filters affect webpages that vary along these social dimensions. Our experiments illuminate a range of implicit preferences in data curation: we show that some quality classifiers act like topical domain filters, and langID can overlook English content from some regions of the world. Overall, we hope that our work will encourage a new line of research on pretraining data curation practices and its social implications.more » « less
- 
            Elazar, Yanai; Bhagia, Akshita; Magnusson, Ian; Ravichander, Abhilasha; Schwenk, Dustin; Suhr, Alane; Walsh, Pete; Groeneveld, Dirk; Soldaini, Luca; Singh, Sameer; et al (, ICLR)
- 
            Adelani, David; Ibn Alam, Md Mahfuz; Anastasopoulos, Antonios; Bhagia, Akshita; Costa-jussà, Marta R.; Dodge, Jesse; Faisal, Fahim; Federmann, Christian; Fedorova, Natalia; Guzmán, Francisco; et al (, Association for Computational Linguistics)
- 
            Gardner, Matt; Merrill, William; Dodge, Jesse; Peters, Matthew; Ross, Alexis; Singh, Sameer; Smith, Noah A. (, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing)Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have “spurious” instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems. For example, the word “amazing” on its own should not give information about a sentiment label independent of the context in which it appears, which could include negation, metaphor, sarcasm, etc. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account, showing that realistic datasets will increasingly deviate from competency problems as dataset size increases. This analysis gives us a simple statistical test for dataset artifacts, which we use to show more subtle biases than were described in prior work, including demonstrating that models are inappropriately affected by these less extreme biases. Our theoretical treatment of this problem also allows us to analyze proposed solutions, such as making local edits to dataset instances, and to give recommendations for future data collection and model design efforts that target competency problems.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available